AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Neural Information Processing SystemsFeb-16-2026, 05:20:10 GMT

82389fbff376d1e8aec510916d50d054-Paper-Conference.pdf

machine learning, natural language, reinforcement learning, (15 more...)

Country:

Europe > Hungary > Budapest > Budapest (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-12-2026, 18:26:51 GMT

LAPO: Latent-VariableAdvantage-WeightedPolicy OptimizationforOfflineReinforcementLearning

But in practice, it requires querying the behavior policy which is unknown, and using an erroneous approximation of the behavior policy can negatively affect the performance ([39]).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

Asia > China (0.04)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yogev Bar-On, Yishay Mansour

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

Neural Information Processing SystemsFeb-12-2026, 14:17:08 GMT

Neural Information Processing Systems http://nips.cc/

agent, algorithm, graph, (17 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.84)

arXiv.org Artificial IntelligenceDec-3-2025

SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction

Wu, Shengkai, Yang, Jinrong, Luo, Wenqiu, Gao, Linfeng, Shang, Chaohui, Zhi, Meiyu, Sun, Mingshan, Yang, Fangping, Ren, Liangliang, Zhao, Yong

Imitation learning for robotic grasping is often plagued by the multimodal problem: when a scene contains multiple valid targets, demonstrations of grasping different objects create conflicting training signals. Standard imitation learning policies fail by averaging these distinct actions into a single, invalid action. In this paper, we introduce SAM2Grasp, a novel framework that resolves this issue by reformulating the task as a uni-modal, prompt-conditioned prediction problem. Our method leverages the frozen SAM2 model to use its powerful visual temporal tracking capability and introduces a lightweight, trainable action head that operates in parallel with its native segmentation head. This design allows for training only the small action head on pre-computed temporal-visual features from SAM2. During inference, an initial prompt, such as a bounding box provided by an upstream object detection model, designates the specific object to be grasped. This prompt conditions the action head to predict a unique, unambiguous grasp trajectory for that object alone. In all subsequent video frames, SAM2's built-in temporal tracking capability automatically maintains stable tracking of the selected object, enabling our model to continuously predict the grasp trajectory from the video stream without further external guidance. This temporal-prompted approach effectively eliminates ambiguity from the visuomotor policy. We demonstrate through extensive experiments that SAM2Grasp achieves state-of-the-art performance in cluttered, multi-object grasping tasks.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

2512.02609

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceDec-2-2025

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Zhang, Hongyin, Zhang, Shuo, Jin, Junxi, Zeng, Qixin, Li, Runze, Wang, Donglin

Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2511.01331

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Neural Information Processing SystemsNov-19-2025, 17:07:08 GMT

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceNov-18-2025

One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow

Wang, Zeyuan, Li, Da, Chen, Yulin, Shi, Ye, Bai, Liang, Yu, Tianyuan, Fu, Yanwei

We introduce a one-step generative policy for offline reinforcement learning that maps noise directly to actions via a residual reformulation of MeanFlow, making it compatible with Q-learning. While one-step Gaussian policies enable fast inference, they struggle to capture complex, multimodal action distributions. Existing flow-based methods improve expressivity but typically rely on distillation and two-stage training when trained with Q-learning. To overcome these limitations, we propose to reformulate MeanFlow to enable direct noise-to-action generation by integrating the velocity field and noise-to-action transformation into a single policy network-eliminating the need for separate velocity estimation. We explore several reformulation variants and identify an effective residual formulation that supports expressive and stable policy learning. Our method offers three key advantages: 1) efficient one-step noise-to-action generation, 2) expressive modelling of multimodal action distributions, and 3) efficient and stable policy learning via Q-learning in a single-stage training setup. Extensive experiments on 73 tasks across the OGBench and D4RL benchmarks demonstrate that our method achieves strong performance in both offline and offline-to-online reinforcement learning settings. Code is available at https://github.com/HiccupRL/MeanFlowQL.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2511.13035

Country: Asia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2025

Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation

Zhang, Xiucheng, Jiang, Yang, Qing, Hongwei, Bai, Jiashuo

Perceptual ambiguity and task conflict limit multitask robotic manipulation via imitation learning. We propose a framework combining a Language-Conditioned Visual Representation (LCVR) module and a Language-conditioned Mixture-ofExperts Density Policy (LMoE-DP). LCVR resolves perceptual ambiguities by grounding visual features with language instructions, enabling differentiation between visually similar tasks. To mitigate task conflict, LMoE-DP uses a sparse expert architecture to specialize in distinct, multimodal action distributions, stabilized by gradient modulation. On real-robot benchmarks, LCVR boosts Action Chunking with Transformers (ACT) and Diffusion Policy (DP) success rates by 33.75% and 25%, respectively. The full framework achieves a 79% average success, outperforming the advanced baseline by 21%. Our work shows that combining semantic grounding and expert specialization enables robust, efficient multi-task manipulation

artificial intelligence, machine learning, manipulation, (16 more...)

2510.24055

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)